Skip to content

fix(orchestrator): stop failed phase-transition backend#21

Draft
cursor[bot] wants to merge 1 commit into
mainfrom
cursor/critical-bug-inspection-d3c4
Draft

fix(orchestrator): stop failed phase-transition backend#21
cursor[bot] wants to merge 1 commit into
mainfrom
cursor/critical-bug-inspection-d3c4

Conversation

@cursor
Copy link
Copy Markdown

@cursor cursor Bot commented May 10, 2026

Bug and impact

During a mid-run phase transition, _rebuild_backend_for_phase stopped the old backend and then started a fresh backend. If initialize(), prompt rendering, or start_session() failed after the fresh backend had started, the worker still held only the old backend reference and the fresh backend was never stopped. For subprocess-backed agents such as Codex app-server, repeated transition failures could leak orphaned processes and exhaust resources.

Root cause

The fresh backend startup path was not wrapped in cleanup-on-failure logic, and the caller only rebinds client after _rebuild_backend_for_phase returns successfully.

Fix

Wrap fresh backend start/initialize/session setup in a try block and call new_client.stop() before re-raising any failure. Log but do not mask cleanup failures.

Validation

  • python3 -m pytest tests/test_orchestrator_phase_transition.py -> 10 passed
  • PATH="/tmp/symphony-test-bin:$PATH" python3 -m pytest -> 275 passed, 2 skipped

Note: the base container has python3 but no python, so the full-suite rerun used a temporary /tmp/symphony-test-bin/python -> python3 shim for the existing doctor test that checks python -m symphony.mock_codex.

Open in Web View Automation 

Co-authored-by: Agentic-Worker <cskwork@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant